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Introduction 



In different applications (health, finance,...), abrupt changes on the spectral density of long memory 
processes provide relevant information. In this work, we concern ourself with off-line detection. 
However, our method is close to the sliding window which is typically a sequential analysis method. 

We model data by Gaussian processes with locally stationary and long memory increments. By 
using a wavelet analysis, one obtains a series with short memory. We compare numerically the 
efficiency of different methods for off-line detection of these changes, namely penalized least square 
estimators introduced by Bai and Perron (1998) versus a modification of the filtered derivative 
introduced by Basseville and Nikiforov (1993). The enhancement consists in computing the p-value 
of every change point and then apply an adaptive strategy. 

Since estimation of abrupt changes on spectral density is a specific problem, we first study a 
more standard model. In Section 1, we concern ourself to off-line detection of abrupt changes in 
the mean of independent Gaussian variables with known variance and we numerically compare the 
efficiency of the different estimators in this case. In Section 2, we recall the definition of Gaussian 
processes with locally stationary increments and the properties of their wavelet coefficients. Then, 
we compare the different off-line detection methods on simulated locally fBm and present some 
results on real data. 

1 A toy model: off-line detection of abrupt change in the 
mean or independent Gaussian variables 

Let (Xi)i=i jy be a sequence of independent Gaussian r.v. with mean ^ and a known variance a 2 . 
We assume that the map i t-> fit is piecewise constant, i.e. there exists a configuration = tq < 
Ti < • • • < tk < t~k+i = N such that \ii = for < i < t^+i- The integer K corresponds 
to the number of changes. However, in any real life situation, the number of abrupt changes K is 
unknown, leading to a problem of model selection. 

There is a huge literature on this problem, see for instance the textbook of Basseville & Nikiforov 
(1993). Popular methods are those based on penalized least square criterion (PLSC). We refer to 
Birge & Massart (2006) for a good summary of the problem. Other classical references are Lavielle 
& Moulines (2000) or Lebarbier (2005) or Lavielle & Teyssiere (2006). 

From a numerical point of view, the least square methods are based on dynamic programming 
algorithm. Thus we have to compute a matrix of size N . Therefore, the time and memory com- 
plexity of these algorithms is in 0(N 2 ), which becomes an important limitation with the computer 
progress. This has lead us to investigate the properties of a different algorithm. 



Filtered derivative with p-value method (FDp-VM) 

Filtered derivative method is based on the difference between the empirical mean computed on two 
sliding windows respectively at the right and at the left of the index k, both of size A, see [HQ]. This 
difference corresponds to a sequence (D(A, k))A<k<N-A defined by D(A, t) — jj,(A, t) — fi(A, t — A) 

^ k+A 

where fi(A, k) — — Xj is the empirical mean of X on the (sliding) box [k + 1, k + A]. These 

j=k+l 
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quantities can easily be calculated by recurrence with complexity O(N). It suffices to remark that 
AD(A, k + 1) = AD(A, k) + X k+A+1 - 2X k + X k _ A+1 . 

From the other hand, note that (D(A, k))A<k<N-A is a sequence of centered r.v., except in the 
vicinity of a change point T k . In this case, there appears a "hat- function" of size 5 k := (/J. k +i — £*fe) 
approximatively located between r k — A and r k + A, see [5]. Nevertheless, this sequence presents 
also hats at points different from (n, . . . , tk), see Figure 2 below. There are false alarms. 

In order to eliminate these false alarms, we propose to calculate the p-values a k associated to 
all detected change points yk) 1<k<K ■ Then, we keep only the point corresponding to a p- value 
lesser than a critical level a cr iu c - For all k G [1, Kmax], the p-values are calculated by using the 

formula a k = <fi ^j=^<\^4f-\D(A k , f k )\^ where A k is an adaptive window chosen as the minimum of 

the distances between f k and these two neighbors f k -i and ffc+i, that is A k = \f k — f k -i\ A\f k +i — f k \. 
The function <p denotes the complementary cumulative distribution function of the normal law and 
effc denotes the empirical variance of X on the box (ffe_i,ffc + i). 

Remark that K max is an integer fixed by the user. It represents the maximal number of change 
points. As soon as possible, K max should be chosen bigger than the true number of change points 
K. By convention, one sets tq = f q = and tk+i = TKmax+i = N . 

So, the novelty of this work consists in discriminating between true and false alarms by attribut- 
ing a p- value to each detected change point. By keeping the change points with a p- value smaller 
than a cr itic, one obtains the same precision as Lavielle & Teyssiere (2006) or Lebarbier (2005). 
Also, the main advantage of this method lies in its memory and time complexity in O(N). 

A numerical simulation 

At first, we give an example on one sample. In the next subsection, this example is plainly confirmed 
by Monte-Carlo simulations. To begin with, for N = 5000 we have simulated one replication of a 
sequence of Gaussian random variable X\, . . . ,Xjv with variance a 2 = 1 and mean /x(i) = g(i/N) 
where g is a piecewise-constant function with five change points such as d k G [0.5, 1.25], see Figure 1 
below. 
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Figure 1: The sequence (Xj)o<i<_/v with change points represented by the vertical line and means 
represented by the horizontal line . 
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On this sample, we have computed the function k i— > \D(A, k)\ with A 
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= 300, see Figure 2. 
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Figure 2: The hat function. 



Both estimators penalized least square criterion (PLSC) and filtered derivative with p-value 
ctcritic = 10~ 4 provide good results, see Figure 3 and the Monte-Carlo simulation below. 



Figure 3: Theoretical value of the piecewise- constant function g (black), and its estimators given by 
PLSC method (blue) and Filtered derivative method with p-value method (red). 



Monte-Carlo simulation 

In this subsection, we have made M = 1000 simulations of independent copies of sequences of 
Gaussian r.v. X^\ . . . , xffi with variance a 2 = 1 and mean fx(i) = g(i/N), for k = 1, . . . , M. On 
each sample, we apply the FDp-V method and the PLSC method. We find the good number of 
changes in 98.1% of all cases for the first method and in 97.9% for the second one. 




Figure 4: Distribution of the estimated number of change points K for M = 1000 realizations. 
Left: Using PLSC method. Right: Using Filtered derivative method 

Then, we compute the mean errors. There are two kinds of mean error : 



• Mean Integrate Square Error (MISE) defined as MLSE = E 



( 



N + 1 



1 



Y,W/N)-g{i/N)\ 2 



N 
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1E\\9 ~ .9lli 2 ([o i]) w hich corresponds to the L 2 ([0, 1]) norm of the difference between the true 
function g and the estimate function g. The estimate function is obtained in two steps : first 
we estimate the configuration of change points (fk) k=1 ^, then we estimate the value of g 
between two successive change points as the empirical mean. 

• Square Error on Change Points (SECP) defined as SECP = IE I ^ |r/- — Tfc| ) , in the case 
where we have found the good number of abrupt changes. 
We have the following results by Monte Carlo simulation 





Square Error on Change Points 


Mean Integrated Squared Error 


FDp-V method 


1.1840 x 10~ 4 


0.0107 


PLSC method 


1.2947 x 10~ 4 


0.0114 



Table 1: Errors (SECP & MISE) given by FDp-V method and PLSC method 



Next, we compare the mean time complexity and the mean memory complexity. We have written 
the two programs in Matlab and have runned it with computer system which has the following 
characteristics: 1.8GHz processor and 512MB memory. The results concerning time and memory 
complexity are given in Table 2. 





Memory allocation (in Megabytes) 


CPU time (in second) 


FDp-V method 


0.04 MB 


0.005 s 


PLSC method 


200 MB 


240 s 



Table 2: Memory and time complexity of Filtered derivative method and PLSC method 



A First conclusion 

On the one hand, both methods have the same accuracy in terms of percentage of selection of the 
exact model, Square Error on the configuration of change points or MISE. On the other hand, the 
filtered derivative with p- value is less expensive in terms of time complexity and memory complexity. 
Indeed, algorithm based on Minimization of penalized least square criterion can use 39% of computer 
memory, while Filtered derivative method only needs 0.008%. This plainly confirms the difference 
of time and memory complexity, i.e. 0(N 2 ) versus O(N). 

Observe that algorithms based on penalized least square are considered by Davis et al. (2008) 
as maximizing a posteriori (MAP) criterion, whereas filtered derivative is based on sliding window 
and could be adapted to sequential detection, see for instance Bertrand (2000) and Bertrand & 
Fleury (2008). 

2 Segmentation on the spectral density estimation of some 
long memory processes 

Our model 

Let X be a Gaussian centered process with stationary increments, it is known, see Cramer Sz Lead- 
better (1967), that this process has the harmonizable representation X(t) = J M (e 4 *^ — l) f 1 ^ 2 (£,)dW({;) 
for all t G M where W{dx) is a complex Brownian measure such that X(t) is a real number for 
all t G Ft. This process has long memory, but its wavelet coefficient d^,{a,b) is a short memory 
Gaussian process where, for a scale and a shift (a, b) G -2?+ x 1R and a wavelet ip with a compact 
time support [Li,^]) one has defined 

d^(a,b) := a' 1 ' 2 J ^ (tz^j X (t) dt (1) 
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Moreover, for any fixed scale a, b i-> (a, b) is a stationary, centered, Gaussian process with variance 
If (a) := / |-0(a;)| 2 f(x/a) dx, see Bardet & Bertrand (2007). Next, we assume that the signal is a 

J M 

Gaussian process, centered, with locally stationary increments given by the representation formula 



X(t) 



[ (e ltc -1) f 1/2 (t,£)dW(£), for all t e M, 

J R 



(2) 



where £ n> /(t, £) is an even and positive function, called spectral density piecewise constant, i.e., 
there exists a partition r% < r 2 < ■ ■ ■ < tk and Hurst parameters H = (Hq, H±, . . . , Hk) such that 
/(*,£) = /fc(0 = C(H k )\^\- 2H "- 1 for i g [r fe ,r fc+1 [ and C{H k ) = T,- l H k T{2H k ) sm(nH k ). Thus 
the series log \d^(a, bi)\ 2 where d^(a, b) is defined by (TTJ and b L = i E IN has a piecewise constant 
mean ^ and a finite known variance, more precisely, one has log \d^(a, bi)\ 2 = ^ + where 

In / |-0(a;)| 2 f k (x/a) dx if 



are weakly dependent r.v. of law In \U\ 2 with {/ ~ A/"(0, 1) and //j 
h € [r fe - aii, Tfc+i - aL 2 ]- 



Numerical simulation 

First, for T = 10 5 , we have simulated one realization of process (X (t))t£[o.T] with five change points 
r = {12500, 25496, 43045, 70083, 82040} and Hurst parameters H = (0.55', 0.67, 0.53, 0.61, 0.7, 0.57). 
Let us stress that, after having changed the scale in order to obtain Hurst index belonging to (0, 1), 
the configuration of change points and means is the same as in Section 1. 



A M 

w 



Figure 5: One replication of the process (^(0)te[o,T] 



Next, for a frequency 1/a = 0.2Hz and by using the Daubechies wavelet of order 6, we have 
calculated the wavelet coefficients (d^(a, bo), . . . , d^(a, £>iv)) with b k = k. Figure 5 below displays the 
sequence (Yq, . . . , Y'n) where Y k = log \d^(a, b k )\ 2 ■ Then, we have calibrated the Filtered derivative 
algorithm with A = 500 and a C ritic = 10~ u . We observe that the detected change points perfectly 
fit the theoretical configuration of changes. 




Figure 6: Segmentation of the sequence (Yq, . . . , Yjv) 



Note that we can not use penalized least square criterion due to the size of data, indeed PLSC 
would have need 80GB which is almost 150 times our computer memory capacity. 
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3 Application to real data 

Recent measurement methods allow us to access to electrocardiograms (ECG) for healthy people 
over a long period of time: marathon runners, daily (24 hours) records, etc. These large data sets 
allow us to characterize the variation of the heartbeat rate in the parasympathetic frequency band 
(0.15 Hz, 0.5 Hz). According to the recommendations of the Task Force of Cardiologists [14] . this 
frequency band corresponds to the parasympathetic system of control of the heartbeat. Moreover, 
the spectral density of the heart beat time series follows a power law, thus after having substract 
its mean, this series can be modelized by (2). Figure 7 provides an example of interbeat time 
series record on an healthy subject during 24 hours. We have calculated its wavelet coefficients and 
used the Filtered derivative algorithm with A — 500 and a cr iu c = 10 -11 . We obtain the following 
segmentation: r = {14435, 21903, 28003, 31984, 33377, 37274, 40470, 42306, 73153} 




Figure 7: Segmentation of the heart interbeat for healthy subjects during a period of 24 hours 



In future works, we will investigate sequential detection of change points of the Hurst index in 
connection with the cardiac behavior of sick subject. 
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